Gated Multimodal Units for Information Fusion

نویسندگان

  • John Edison Arevalo Ovalle
  • Thamar Solorio
  • Manuel Montes-y-Gómez
  • Fabio A. González
چکیده

This paper presents a novel model for multimodal learning based on gated neu-ral networks. The Gated Multimodal Unit (GMU) model is intended to be used as an internal unit in a neural network architecture whose purpose is to find an intermediate representation based on a combination of data from different modalities. The GMU learns to decide how modalities influence the activation of the unit using multiplicative gates. It was evaluated on a multilabel scenario for genre classification of movies using the plot and the poster. The GMU improved the macro f-score performance of single-modality approaches and outperformed other fusion strategies, including mixture of experts models. Along with this work, the MM-IMDb dataset is released which, to the best of our knowledge, is the largest publicly available multimodal dataset for genre prediction on movies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multimodal medical image fusion based on Yager’s intuitionistic fuzzy sets

The objective of image fusion for medical images is to combine multiple images obtained from various sources into a single image suitable for better diagnosis. Most of the state-of-the-art image fusing technique is based on nonfuzzy sets, and the fused image so obtained lags with complementary information. Intuitionistic fuzzy sets (IFS) are determined to be more suitable for civilian, and medi...

متن کامل

MEMN: Multimodal Emotional Memory Network for Emotion Recognition in Dyadic Conversational Videos

Multimodal emotion recognition is a developing field of research which aims at detecting emotions in videos. For conversational videos, current methods mostly ignore the role of inter-speaker dependency relations while classifying emotions. In this paper, we address recognizing utterance-level emotions in dyadic conversations. We propose a deep neural framework, termed Multimodal Emotional Memo...

متن کامل

Ontology-based multimodal high level fusion involving natural language analysis for aged people home care application

This paper presents a knowledge-based method of early-stage high level multimodal fusion of data obtained from speech input and visual scene. The ultimate goal is to develop a human-computer multimodal interface to assist elderly people living alone at home to perform their daily activities, and to support their active ageing and social cohesion. Crucial for multimodal high level fusion and suc...

متن کامل

Performance Evaluation of Multimodal Multifeature Authentication System Using KNN Classification

This research proposes a multimodal multifeature biometric system for human recognition using two traits, that is, palmprint and iris. The purpose of this research is to analyse integration of multimodal and multifeature biometric system using feature level fusion to achieve better performance. The main aim of the proposed system is to increase the recognition accuracy using feature level fusio...

متن کامل

Comparative Study of Different Fusion Techniques in Multimodal Biometric Authentication

Multimodal biometric authentication resolves no. of issues present in unimodal biometrics. There are number of ways for the fusion of different modalities in multimodal biometrics. Fusion could be either before matching the scores or after matching the score. The presented research paper deals with the comparative study of different techniques which performs fusion of information after matching...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1702.01992  شماره 

صفحات  -

تاریخ انتشار 2017